About PanKB


The microbial world holds vast potential for advancements in diverse fields such as food production, human health, and ecological sustainability. PanKB is a pangenomic knowledgebase working to empower practitioners to leverage microbial functions beyond those of a select few model organisms. PanKB provides:

  • A growing dataset of pangenomic results,
  • Interactive data reports and analytics for exploration, analysis, and new potential discoveries,
  • Global database search for genes, pathways, products, species, and more,
  • Alleleomes describing the amino acid variants across gene alleles,
  • Dataset download providing access to raw data, search results, and pangenomics analytics for custom analysis
  • A bibliome of open-access pangenomic publications,
  • A specialized LLM-powered chat interface to accelerate knowledge acquisition by providing deeply-detailed query responses on publication content, including supporting references, and constrained to not return hallucinated content.

Using PanKB


Pangenomes

PanKB includes multiple interactive analytics and tables that give an overview of a pangenome's contents. These analytics can be found or navigated to an individual pangenome's page.

Using Lactiplantibacillus plantarum as an example:

  • Overview Page shows the presence/absence gene matrix, COG category distribution, core/accessory/rare pangenome gene categorizations, and pangenome openness of Lactiplantibacillus plantarum Pangenome.
  • Gene Annotation Table provides detailed gene annotation of all gene clusters found in the Lactiplantibacillus plantarum pangenome.
  • Phylogenetic Tree displays the phylogenetic structure of Lactiplantibacillus plantarum species.


Alleleomes

Pangenomes also serve as the foundation for further large-scale analyses, and PanKB is actively integrating their novel results. Recent pangenomic-scale analyses of variants, named Alleleomics, demonstrated unique value in narrowing the solution search space for feasible genetic variants in E.coli. PanKB currently includes the alleleomes of all of its pangenomes. Alleleome analytics can be found on pangenome and specific gene pages.

Using Lactiplantibacillus plantarum as an example:

  • For genome alleleome, users can find it in the Overview Page.
  • For single gene alleleome, for instance, gene accA2 in the Lactiplantibacillus plantarum. By clicking on the accA2 in the Lactiplantibacillus plantarum's Gene Annotation Table, users can access the accA2 gene page, where shows the alleleome of accA2.


Data Accessibility

PanKB implements multiple different methods for accessing its data. Users can access all of PanKB's data through its navigation links. Users can also quickly find specific data through the global database search feature accessible on most pages. Finally, users can also download the database's raw data through the various analytics hosted on database pages.


Data Application

Combined, PanKB's features enable valuable workflows for enzyme and strain engineering. These include identifying genes for new enzyme production or reintroduction into strains, pinpointing precise gene edits to modify activity, discovering and optimizing valuable pathways, and selecting optimal starting strains. Current strain engineering heavily relies on models or familiar strains; the features and data of PanKB empower strain engineers to start leveraging pangenomic data for targeted bioengineering.


PanKB LLM

Scientific progress often necessitates extensive literature review, a traditionally time-consuming process. Large Language Models (LLMs) offer a potential solution by aggregating and summarizing knowledge across documents. PanKB includes an LLM chatbot (AI Assistant) focused on an open-access pangenomic bibliome, designed to accurately answer deep questions on pangenomics, cite relevant articles, and not attempt to hallucinate inaccurate content. This feature is an initial experiment towards combining an LLM and a specialized scientific database to accelerate scientific knowledge acquisition through automated knowledge extraction.

  • PanKB LLM can be accessed by clicking on the AI Assistant link in the navbar.


Tools and Methods


Pangenome Analysis: BGCFlow

Interactive Visualization: D3.js, Plotly.js, Highchart.js, hotmap.js, MSAViewer

Front-end Web Frameworks: Bootstrap, jQuery

Back-end Web Framework: Django

Database: Azure Cosmos DB for MongoDB


Contact Us


If you have questions or find any bugs in the database, please contact Patrick Phaneuf.


Funding


This work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF Grant Number NNF20CC0035580).